An open dataset for individual tree detection in UAV LiDAR point clouds and RGB orthophotos in dense mixed forests

We present an open access dataset for development, evaluation, and comparison of algorithms for individual tree detection in dense mixed forests. The dataset consists of a detailed field inventory and overlapping UAV LiDAR and RGB orthophoto, which make it possible to develop algorithms that fuse multimodal data to improve detection results. Along with the dataset, we describe and implement a basic local maxima filtering baseline and an algorithm for automatically matching detection results to the ground truth trees for detection algorithm evaluation.

for publishing this dataset is the NewFor Alpine benchmark open dataset 11 .It provides a similar collection of field inventory ground plots in the Alpine area covered by a UAV LiDAR survey.The main differences and similarities between our dataset and the NewFor benchmark are described below.

Materials and methods
This section provides detailed descriptions of the dataset, a simple matching algorithm for evaluation of detection results, and a basic baseline approach for individual tree detection to benchmark against.

Dataset
The dataset consists of a field inventory survey and overlapping UAV LiDAR point clouds and RGB orthophotos.The study area is located in Perm Krai, Russia, approximately 90 km to the east of the region's administrative center, Perm.The forest in this area is mixed and dense, with complex and irregular canopy structure.The field survey consists of 3600 trees across 10 rectangular ground plots 100 m in length and 50 m in width.Every tree is represented as a point in the UTM 40N coordinate reference system (EPSG:32640).Every tree has a species label and diameter at breast height, measured with calipers at 1.3 m from the ground at two perpendicular directions and averaged.The inventory covers seven species of trees, approximately equally split among coniferous and deciduous: spruce, birch, fir, aspen, tilia, alder, and willow.Approximately 20% of the trees have height data measured during the inventory, and 10% have ages measured on core samples.
The LiDAR sensor used for the survey is AGM-MS3, with 640 kHz acquisition rate, 300-m range, and spatial accuracy of 3-5 cm.The mean height of data acquisition is 150 m, with the UAV configured to follow the terrain using the SRTM elevation map as a reference.The point clouds were preprocessed by removing duplicates and noise and classifying the points that represent the ground, which allows normalizing height within the point clouds.The duplicate removal was run with a threshold distance between points of 1 mm.Noise was removed by visually inspecting the point cloud and manually selecting height thresholds to cut off points that are lower than the ground or higher than the canopies.Height normalization allows treating the Z coordinate as height above ground rather than the absolute elevation, which simplifies many subsequent steps.The raw point clouds were processed with the combination of the AGM ScanWorks software from the sensor vendor and the TerraScan software.The average point density in the clouds is 37 points per square meter.
The camera used in the orthophoto survey is Sony A6000.The resolution of the RGB orthophoto is 7 cm per pixel.
The field inventory was conducted in May 2021, the UAV laser scanning and RGB surveys were conducted in July 2021, all in leaf-on conditions.
It is important to note that the locations of the trees recorded in the field survey are measured relatively close to the ground, while detection algorithms for UAV LiDAR point clouds usually detect the tops of the trees.There can be a significant distance in the XY plane between the location of the trunk at breast height and the highest point of the canopy because of tilt, complex canopy shape of broadleaf species, or both.
Figure 1 shows the locations of the ground plots over the full size RGB orthophoto on the left and a visualization of the field inventory and a point cloud in 2D for a single plot on the right.Figure 2 shows the distribution of species in the dataset.The dominant species in the data is spruce, but overall the trees are evenly split between deciduous and coniferous species.Figure 3 shows two of the point clouds in 3D, highlighting the differences in canopy structure complexity between predominantly deciduous and coniferous plots.

Comparison to other datasets
Our dataset is in many ways similar to the NewFor benchmark 11 .It serves the same purpose and also offers overlapping field survey ground plots and UAV LiDAR point clouds.There are, however, many notable differences.The NewFor benchmark covers much more diverse regions, including ground plots from France,  1. Ground plot statistics and baseline detection results.Point density is calculated as the number of points in the point cloud divided by the area of its 2D convex hull.Recall is the proportion of ground truth trees that have been detected, precision is the proportion of the detections that are actual ground truth trees, F1-score is the harmonic mean of the precision and recall.Distance is the average 2D distance between a detected tree and its match in the ground truth data.

Figure 3.
3D visualizations of point clouds from plots 1 and 6.Note the difference in the canopy structure: plot 1 is predominantly deciduous, while plot 6 is coniferous.There is no single local maxima filter window size that will produce satisfactory results for both of these plots at the same time.Italy, Switzerland, Austria, and Slovenia, while all our data comes from the same area.The tree species covered by the datasets are also different: both contain spruce and fir, but the NewFor data also has beech, Scots pine, larch, sycamore, and poplar, while ours also has birch, aspen, tilia, alder, and willow.Our dataset has more than twice as many individual trees as the Alpine benchmark, and the forest is denser, more complex, and more diverse, making it more complicated to detect trees in.Our data contains very mild terrain variations, while the slopes of the terrain in Alpine data are very steep.This plays a role during height normalization, since subtraction of steep terrain introduces artificial slope to the points in the canopy, changing the overall shape of the tree.The point clouds in our dataset are denser as well, making them more detailed, but more computationally heavy to work with.Our dataset has an additional information source-an RGB orthophoto that allows development of algorithms that fuse multimodal data, which, we believe, is a key to success in such complex environments.Our dataset has species labels for every surveyed tree, but only partial coverage of tree heights and no timber volume information at all.Another similar dataset is the NeonTreeEvaluation Benchmark 12 , which offers bounding box annotation for tree detection across a wide range of different forest types.It offers coregistered RGB, LiDAR, and hyperspectral images over 31,000 individual trees.The main difference in the reference data between the NeonTreeEvaluation Benchmark and our dataset is the source: our data comes from a field inventory and thus has additional tree information that can be used in downstream tasks, such as species classification or timber volume prediction, while the NeonTreeEvaluation Benchmark annotations are created from the RGB photo and thus only offer the positions and sizes of trees.
Another dataset is the IDTReeS 2020 Competition Data 13 aimed to develop algorithms for delineation and species classification of individual tree crowns in RGB, LiDAR, and hyperspectral data.It offers bounding box annotations for 1200 individual trees covered by RGB, LiDAR, and hyperspectral images in 3 national forests in the USA.Similarly, the source of the data is annotation of images, not a field inventory.
There are also datasets available that don't have LiDAR point cloud coverage, or use terrestrial LiDAR instead of UAV LiDAR, or use photogrammetric point clouds instead of LiDAR point clouds.We do not mention them here because we specifically focus on UAV LiDAR.

Baseline: local maxima filtering
We provide an implementation of a very basic baseline algorithm for detection of individual trees within a point cloud: local maxima filter.It is the core of many other detection methods, which often vary only on what the local maxima filter is applied on, with what parameters, how many times, and how the results are merged into the final detection.The version we used is the most basic of all: it is a one pass filter directly on the point cloud and is parametrized by a fixed size window and a threshold for height that excludes all low points from consideration.Note that the algorithm assumes that the point clouds are height normalized and treats the Z coordinate as height above ground.Our implementation first discards the points lower than the height threshold.Then it constructs a kd-tree for all the points in the point cloud, and iterates through unprocessed points, repeatedly querying for neighbors in a sphere of the specified window size and marking all points but the highest as processed.Once it encounters a point that is the highest in its neighborhood, it marks it as a local maximum.
As mentioned, the algorithm can be improved in many ways, the easiest of which is a careful tuning of the window size for every plot.However, this approach is a priori not good enough to work on such complex data.Its main principle is very naive, and it does not fully utilize the vertical structure information available in the point cloud and does not use RGB data at all.The main purpose of providing it is to publish the dataset with an end-to-end example of implementing a detection procedure.

Matching algorithm
To evaluate the results, an automated and deterministic method for matching detected trees to the ground truth data is required.It should determine which ground truth trees, if any, the detected candidates correspond to.It should also classify all the trees from both sets as either a true positive, meaning that the ground truth tree was successfully detected, a false negative, meaning that a ground truth tree was not detected, or a false positive, meaning a tree was detected when there is none.We implemented a matching procedure that considers the locations and heights of the trees, and falls back to using only locations when the height information is not available for the ground truth tree.It is parametrized by the maximum allowed 2D distance and the maximum height difference between a detected and a ground truth tree to consider them a match.First, it constructs a distance matrix between the ground truth tree locations and the detected candidates and filters out the pairs for which the distance is larger than the allowed maximum.It then iterates over the remaining pairs in the order of increasing distance, and marks the pairs with suitable heights in which both trees are not yet assigned a class as a true positive.The pairs in which one of the trees is already assigned a class are marked as either a false positive or a false negative, and the pairs in which both trees are assigned a class are skipped.Finally, it marks all unmatched ground truth trees as false negatives, and all unmatched candidate trees as false positives.
An example of the detection and matching results is presented in Fig. 4.Each true positive is a pair of points, one from the ground truth and the other from detected candidates.They are represented in the figure as yellow triangles connected by red lines.Errors are represented by red triangles.Numbers near the triangles are heightsall the detected tree candidates have them, but only part of the ground truth data does.

Results
Table 1 shows a summary of the main metrics achieved by the baseline algorithm for every plot, and Fig. 4 shows a fragment of its output processed by the matching algorithm.The metrics were achieved by running the local maxima filter with a window size of 2 m and a height threshold of 3 m.The matching algorithm was run with a maximum distance between matches of 5 m and a maximum height difference of 3 m.We report the metrics that are most commonly used to evaluate individual tree detection results.Recall is the proportion of ground truth trees that have been detected, it is the ratio of the number of true positives to the number of true positives and false negatives.Precision is the proportion of the detected trees that are successfully matched to ground truth trees, it is the ratio of the number of true positives to the number of true positives and false positives.F1-score aims to combine precision and recall into a single metric, it is the harmonic mean between them.Distance is the average distance between a detected tree and its corresponding ground truth tree.Note that the distance metric should be interpreted with care, since it mostly compares different points: the detection results in most algorithms correspond to the top of the canopy, while the ground truth point is the location of the trunk at breast height, so even perfect detection of every tree will result in non-zero distance error.Moreover, it is artificially limited from above by the parameters of the matching procedure.The average values for all of these metrics are around 50%, meaning that approximately half of the trees in the ground truth data are detected correctly, and there are as many false positives as there are false negatives.These values vary greatly with the window size of the local maxima filter.Making a window smaller increases recall but decreases precision, since there are many more detections, many of which are false positives.Running the algorithm with a window size of 1 m results in average recall of 86%, but precision of only 10% and F1-score 17%.Similarly, increasing the window size drops recall but boosts precision.Running the algorithm with a window size of 5 m results in average precision of 97%, but recall only 11% and F1-score 20%.A more thorough exploration of this dependency is presented in Table 2, which reports metrics for a range of possible values of the window size.
Note in Fig. 3 how different are the canopy structures of plots with predominantly coniferous and deciduous trees.It is evident that using the same parameters for the local maxima detection in both types of forests will lead to suboptimal results, which further complicates the development of universal algorithms for such data.
As mentioned in the description of the local maxima filtering algorithm, we do not expect good performance of such a simple algorithm on data of this complexity.It does not fully utilize 3D structure information available in the point cloud and does not use the RGB data at all.It is also not a good fit for detecting broadleaf tree species, especially in dense forest stands, since their canopies are usually complex, often without a single highest point at the middle of the canopy, but many such points in irregular shapes.The main reason for including it at all is to release the dataset together with an end-to-end example of implementing and evaluating a detection algorithm.The code for generating the reported results is available alongside the dataset in the hope of making it easy to understand and start working with the data.

Discussion
We present an open access dataset for development, evaluation, and comparison of algorithms for detection of individual trees within UAV LiDAR point clouds and RGB orthophotos in dense mixed forests.The dataset consists of a detailed field inventory of 3600 trees, with an even mix of deciduous and coniferous species, covered by a dense point cloud and a high resolution orthophoto.We released this dataset to be freely used for research purposes, along with implementations of a basic baseline detection algorithm and a matching procedure for evaluation.The data covers the type of forest for which simple algorithms cannot generate good enough results and which has been a target for research into more complex algorithms for a long time.Especially interesting is the possibility of developing algorithms that fuse multimodal data: active LiDAR point clouds and passive RGB images.Fusion of data from different sources is a reliable way to improve the quality of many tasks in remote sensing for forestry applications 14,15 .For forests that are as dense and complex as the one in the presented dataset, neither data source on its own has been enough to solve individual tree detection to an acceptable level of accuracy.We believe the most promising results will come from algorithms that utilize both, since they will have access to very different information about the same objects: color and texture from the RGB and 3D structure from the LiDAR.Even though the explored minimalistic baseline detection algorithm does not utilize the orthophoto in any way, the code that comes with the dataset has an example of a simple way to enrich the point cloud with color information by sampling the orthophoto.We hope that this dataset will be a useful addition to the field.

Figure 1 .
Figure 1.The study region and the data.Left: The locations of field survey plot boundaries on a full-size RGB orthophoto.Each plot is a 50 by 100 m rectangle, with every tree within measured and recorded.Buffered cutouts of the orthophoto come with the dataset, along with LiDAR point clouds.Right: A close up of plot number 4. Each colored point represents a single tree of a different species, on top of a point cloud colored by the value of the height of each point (lower points are dark, higher points are bright) and the same orthophoto.

Figure 2 .
Figure 2. The distribution of tree species in the dataset.The dominant species is spruce.The data is split almost evenly among coniferous and deciduous species.

Figure 4 .
Figure 4.A fragment of plot number 10 with the visualization of detection and matching results.Triangles that point upwards are trees from the field survey, triangles that point downwards are candidates detected by the algorithm.Red triangles represent trees that have not been matched between the detection results and the ground truth, and yellow triangles connected by red lines represent matches.The numbers are the heights of the trees.The detection was run with a window size of 2 m and a height threshold of 3 m.The matching algorithm was run with a max distance of 5 m and a max height difference of 3 m.

Table 2 .
Matching results for a range of local maxima filter window sizes.The size of the window controls the tradeoff between precision and recall: smaller windows lead to many more detections, many of which are false positives, thus increasing recall and decreasing precision, while larger windows lead to selection of only the most dominant trees, leaving many undetected and thus decreasing recall and increasing precision.F1-score reflects this disbalance.The value of the second parameter, height threshold, has almost no effect on the results and is thus not included in the exploration.